2 December 2009 



SINGLE-QUERY LEARNING 
FROM ABELIAN AND NON-ABELIAN 
HAMMING DISTANCE ORACLES 

David A. Meyer* and James Pommersheim* ^ 

'Project in Geometry and Physics, Department of Mathematics 
University of California/San Diego, La Jolla, CA 92093-0112 

^Department of Mathematics 
Reed College, Portland, OR 97202-8199 

dmeyer@math . ucsd . edu, j amieOreed . edu 



ABSTRACT 

We study the problem of identifying an n-bit string using a single quantum query to an 
oracle that computes the Hamming distance between the query and hidden strings. The 
standard action of the oracle on a response register of dimension r is by powers of the cycle 
(1 . . . r), all of which, of course, commute. We introduce a new model for the action of an 
oracle — by general permutations in S r — and explore how the success probability depends 
on r and on the map from Hamming distances to permutations. In particular, we prove 
that when r = 2, for even n the success probability is 1 with the right choice of the map, 
while for odd n the success probability cannot be 1 for any choice. Furthermore, for small 
odd n and r = 3, we demonstrate numerically that the image of the optimal map generates 
a non-abelian group of permutations. 



2010 Physics and Astronomy Classification Scheme: 03. 67. Ac. 

2010 American Mathematical Society Subject Classification: 68Q12, 68Q32, 05B20. 
Key Words: Quantum algorithms, permutation model. 



1 



Learning from Hamming distance oracles 



Meyer & Pommersheim 



1. Introduction 

Suppose we wish to identify an n-bit string a by querying an oracle that computes 
the Hamming distance of any query x from a. Previous work has shown that if the oracle 
returns the Hamming distance modulo 4, there is a quantum algorithm that identifies a 
with probability 1, using only a single query [1]. On the other hand, if the oracle returns the 
Hamming distance modulo 2, there is no algorithm, either classical or quantum mechanical, 
that can identify a with probability greater than l/2 n_1 , using any number of queries. 1 
In the latter case, we can think of the oracle adding the Hamming distance into a two 
dimensional response register (so its remainder modulo 2 is all that matters), or we can 
think of the oracle adding a single bit — the least significant bit of the Hamming distance— 
into a two dimensional response register. The latter point of view might lead us to believe 
that the difficulty stems from the oracle returning only a single bit, compared to the two 
bits that it returns when it computes the Hamming distance modulo 4. 

Our first, possibly surprising, result demonstrates that when n is even this belief is 
wrong — there is a quantum algorithm that takes a single bit from the Hamming distance 
computed by the oracle and identifies a with probability 1 using a single query. Knowing 
such an algorithm exists, our second result is perhaps equally surprising: when n is odd the 
original belief is at least partially correct — there is no probability 1 algorithm for finding a 
using any single bit of the Hamming distance. By "any single bit of the Hamming distance" 
we mean any function 

g a (x) = h(d\st(a,x)), (1.1) 

where h : {0, . . . , n} — > {0, 1}. Both of these results involve learning (or failing to learn) 
an element from a set of binary functions of x, indexed by a, so they can be understood 
as solutions to problems in computational learning theory where the set is a concept class 
and its elements are concepts [2]. 

Combining our new results with the previous ones leads us to make two observations: 
the probability of correctly learning a depends on (1) the dimension of the response register 
and (2) how the oracle's response acts on this register. The first observation suggests 
generalizing the notion of concepts, which are binary functions, to 1^-valued functions, 
for sets Y other than {0, l}. 2 The second observation motivates the main conceptual 
contribution of this paper — a new model for the action of quantum (and reversible classical) 
non-abelian oracles — the permutation model. In this model, we fix a response register C R , 
where R is a finite set, and assign to each possible reponse y G Y a permutation o y G Sr of 
the set R. More precisely, to implement an oracle which computes the (classical) function 
/ : X — > y, we are free to choose any map a : Y — > Sr, and given this choice, the oracle 
acts on C x C R by 

0(f)\x,b) = \x,a f(x) (b)) 



This follows from the fact that the weight of a modulo 2 partitions the set of n-bit strings into two 
subsets of size 2 n ~ 1 , with each element having even Hamming distance from the elements in the 
same subset and odd Hamming distance from the elements in the other subset. 

2 Limited versions of this generalization have been considered previously. See, for example, [3,4]. 
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When the oracle acts in the standard way, by adding the function value it computes into the 
response register, cr(Y) C C r < Sr, where C r is the cyclic group with r = \R\ elements, 
and is thus abelian. But this need not be the case: <j(Y) can generate a non-abelian 
subgroup of Sr when r > 2, and for some problems the optimal solution has this property. 

Our final set of results addresses the problem of maximizing the probability of success 
for odd n within this permutation model. We emphasize that although in this paper we 
study only Hamming distance oracles, any non-trivial oracle can be set up to have a non- 
abelian action, and this can improve the probability of success relative to an abelian action, 
as it does for the oracles we consider. 

2. Background 

Most quantum algorithms include one or more calls to a subroutine or oracle that 
evaluates some function at the argument passed to it. In some cases, like Shor's algorithm 
[5] and the various quantum algorithms for hidden subgroup problems [6], the range of 
this function is a large set Y (so that, for example, the function can take distinct values on 
distinct cosets of the hidden subgroup). In others, like Grover's algorithm [7], the range 
of the function is only {0, 1}. 

In the latter cases, the problem of identifying the function can be recognized as a 
problem in computational learning theory [8]: The set of possible functions C C {0, 
where X is the domain of the function, is the concept class; each function c : X — > {0, 1} 
is a concept; and c _1 (l) C X is the extension of the concept c. Concept learning is the 
process by which a student (the learner) identifies (or approximates) a target concept c 
from a concept class C. In active learning the student can query a teacher for information 
about the target concept. Asking a teacher if x G X is in the extension of c is equivalent 
to passing x to a subroutine or oracle that evaluates c at its argument. 

Many natural concept learning problems — including Grover's [7] UNSTRUCTURED 
SEARCH problem; Bernstein and Vazirani's [9], and Barg and Zhou's [10], SIMPLEX CODE 
DECODING problem; and Hunziker, et a/.'s [8] BATTLESHIP and MAJORITY problems — are 
highly symmetric. In each of these \C\ = \X\ and there is an abelian group G action on C 
and X that is transitive and satisfies (g ■ c)(g ■ x) = c(x) for all c G C, x G X , and g G G. 

In this paper we consider problems which have this symmetry for G = X = 1^ ■ Each 
involves a specific function of the Hamming distance between some unknown n-bit string 
a G and x G ZI?, dist(a,x) = \{i \ cii ^ Xi}\; this is invariant under the action of G 
since d\st(g + a, g + x) = dist(a, x). Now, until it is composed with a binary function as in 
(1.1), dist(a, •) : X — > {0, . . . ,n} = Y does not define a traditional concept (except in the 
trivial case n = 1), so it is useful to define a Y- valued concept class to be a set of functions 
C CY X . We extend our use of "learning problems" to include these cases. 

DEFINITION. An (n, r)-Hamming distance oracle accepts queries seZj and then acts on 
an r-dimensional response register according to some function of the Hamming distance 
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dist(a, x), for some fixed a G . 3 



Our goal is to optimize single-query learning from such Hamming distance oracles, i.e., 
to maximize the probability of correctly identifying a after a single call to the subroutine 
that computes the function. Since we assume a uniform distribution on a, we consider 
only quantum algorithms that begin with an equal superposition query, 4 i.e., that pass 
to the oracle a state of the form |?7 ) ® ip = H® n \0 . . .0) <g) ifi, where H is the Hadamard 
transformation ( \ \)/V2 and V e C r . If 0{o) : (C 2 )® n <g> C r -> (C 2 )® n ® C denotes the 
action of the oracle with parameter a, the problem reduces to identifying which of the 2 n 
states 0{a)\rf) <S> ip is returned by the oracle. An optimal solution to this problem can be 
obtained by a complete von Neumann measurement [11,12,13,14]; equivalently, we want to 



maximize 



2™-l r-l 



Y\MUO{a)\rf)®^ 



a=0 6=0 



(2.1) 



over 



all unitary maps U G U(2 n r) and states ip G C r . 



3. Using a different bit of the Hamming distance 

We begin by considering the problem of learning an n-bit string from an oracle that 
returns the second least significant bit of the Hamming distance of a query, rather than 
the least significant bit as in [1]. To be precise, let n be a natural number, and for any 
a G Z£, define a function f a :U^^ {0, 1} by 

t ( \ - f if dist ( a 5 = 0, 1 (mod 4); 
/aW ~\l if dist(a,x) = 2,3 (mod 4). 

Thus f a {x) is the second least significant bit of the Hamming distance between a and 
x. Set bi(d) to be the second least significant bit of a nonnegative integer d, so f a (x) = 
6i(dist(a, x)). Define C n to be the concept class {f a \ a G I^}- 

LEMMA 3.1. Ifn^l (mod 4) then \C n \=2 n .Ifn = l (mod 4) then f a = fa, where a is 
the bitwise complement of a £ 7^, so there are only 2 n ~ 1 concepts in the class. 

Proof. Suppose f a > = f a and dist(a,a') = d. Since b\{d) = &i(dist(a', a)) = f a '(a) = 
f a (a) = bi (dist(a, a)) = 6i(0) = 0, we must have d = or 1 (mod 4). If a' ^ a there is a 
bit at which a' differs from a. Let x be the bit string obtained from a by complementing this 
bit. Then b\ (dist(a, x)) = b±(l) = so b± (dist(a ; , x)) = b\(d— 1) = 0, so we can conclude 
that d = 1 (mod 4). Now suppose there were a bit at which a' agreed with a. Let j/ be 
the bit string obtained from a by complementing this bit. Then 6i(dist(a, y)) = &i(l) = 1 



3 It is also natural to consider problems with X = [1], in which the Hamming distance is defined 
by the same formula, but in this paper we restrict our attention to k = 2. 

4 In fact, we conjecture that for problems with transitive group actions and uniform priors, the optimal 
solutions always include one that begins with an equal superposition query. 
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and bi (dist(a', y)) =b\(d+ 1), which would imply that d = (mod 4), a contradiction. So 
if a' 7^ a but f a > = f a , there can be no bit at which a' agrees with a, which means a' = a 
and n = 1 (mod 4). | 

As we explained in the previous sections, we are interested in analyzing the probability 
of correctly identifying the hidden bit string a using only a single query to the oracle. 
Classically, it is not hard to see that when the f a are distinct, any strategy yields a worst- 
case success probability of at most 2/2 n = l/2 n_1 , the number of possible oracle responses 
divided by the number of concepts. In contrast, we next show that for even n, this learning 
problem can be solved quantum mechanically with probability 1 using a single query. 

THEOREM 3.2. Let n be even. Then the learning problem defined by C n can be solved 
with probability 1 using a single quantum query. 

We will prove Theorem 3.2 by giving an explicit algorithm below. To show that the 
algorithm is correct we will need two lemmas. For x G 7i%, define x G Z2 by: 



x 



x if wt(x) is even; 
x if wt(x) is odd. 



Here the weight of x, wt(x) = dist(0, x). Note that if n is even, then the function x 1— > x is 
a permutation of Z£ . 

LEMMA 3.3. Let n be a natural number and let o,x 6 Zj. Then 

a ■ x + wt(a)wt(x) = a • x (mod 2). 

Proof. If wt(x) is even, then x = x, and the congruence is easily seen to hold. If wt(x) is 
odd, then x = x, and the congruence follows from the identity a ■ x + a ■ x = wt(a). | 

LEMMA 3.4. Let n be a natural number and let a, x G Z2 . Then 

Proof. First note that for any integer d, 

hl {d)^^^- (mod 2). 
Since dist(a, x) = wt(a) + wt(x) — 2(a • x), this implies 

u (a- +( ^^ (wt(a)+wt(x)-2(a-x))(wt(a)+wt(x)-2(a-x)-l) 

01 (dist(a, x)j = ^^7- (mod 2). 

Expanding the numerator on the right hand side of this congruence, and dropping multiples 
of 4, gives 

, / .- / wt(a) 2 - wt(a) + wt(x) 2 - wt(x) . . . . . . nS 

6i(dist(a,x)) = — — — — — +wt(a)wt(x) + a-x (mod 2). 
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Using Lemma 3.3, we can replace \Nt(a)wt(x) + a ■ x with a ■ x and the result follows. | 

Proof of Theorem 3.2. We take the oracle to act on (C 2 )® n ® C 2 in the standard way, 

0(a):\x)\b)^\x)\b + f a (x)), 

although it is b\ (dist(a, x)) that is being added into the response register, not dist(a, x). 
The following quantum algorithm identifies a with probability 1, applying O(a) only once. 

Algorithm A. 

1. Initialize the state to |0. . .0)|0) G (C 2 )® n <g> C 2 . 

2. Apply the unitary transformation H® n <g> HX, where X = (® This produces the 
state 

to°>i-> = 2^£i*>i->. 



where |-) = (|0) - \l))/y/2. 
3. Let D be the diagonal matrix acting on (C 2 )® n by D\x) = (-l) b i( wt ( x )) \x). Apply 
D <S> I, producing the state 



1_ ^ (_,)..(*» 



2 n/2 



4. Apply the oracle 0(a). This produces the state 

1 (_l)&i(wt(a;))^_ 1 ^6i(di s t(a,x))| a ,^_\ 

By Lemma 3.4, this equals 

1_ ^; ( _i)6i(w.(-)) ( _ 1) -*| a . ) |_ ) . 



2 n A 



5. Let P be the permutation matrix acting on (C 2 )® 71 by P|x) = Applying P ® I 
yields 

±_ ^ ( _i)6i(w.(«)) ( _ 1) -*| i) |_ )j 



2 n/2 

which is equal to 



f_1 \6i(wt(a)) 

1 J V (-i) a 

2 n/2 1 > 

x6Z™ 



since x \— > £ is a bijection. 
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6. Apply H® n ®I. This produces the state (-l) bl( - wt( - a »\a}\-). 

7. Now measure the query register (the (C 2 )® 71 tensor factor) and observe a with prob- 
ability 1. | 

4. Concept classes that cannot be learned with a single query 

Theorem 3.2 cannot be extended to odd n > 1; there is no single equal superposition 
query probability 1 quantum learning algorithm for the concept class C n in this case. In 
fact, when n > 1 is odd, there is no concept class defined by any function of the Hamming 
distance that is perfectly learnable with a single equal superposition quantum query. To 
see this, we begin with the following lemma: 

LEMMA 4.1. Let C be a concept class of size M over a set X of size N . Suppose that there 
is a probability 1 learning algorithm using a single equal superposition quantum query. 
Identifying concepts with bitstrings indexed by X , there exists an integer d > N/2 such 
that any two distinct concepts of C have Hamming distance d. If M = N > 2 is even, 
then the quantum learning matrix, which has entries L xc = (— l) c ( x ) for x £ X , c G C, is 
a Hadamard matrix. 

Proof. Suppose that there is a single query learning algorithm with equal superposition 
query 



N ... 

xEX 



for some unit vector ^ G C 2 . If A = ifj^Xtf; then — 1 < A < l. 5 Let A be the matrix whose 
columns, indexed by concepts, contain the state of the system after querying the oracle. 
Then B = A^A is a matrix whose rows and columns are both indexed by concepts, with 
elements 



B 



cc 



N ^ 



1 if c(x) = c'(x); 
N \\ if c {x) ^ c'(x). 

xEX 



Thus NB CC > = (AT — dist(c, c')) + Adist(c, c'). Since the algorithm succeeds with probability 
1, we must have B cc / = for distinct concepts c ^ c'. In this case 

, / M N N 

d = dist(c,c ) = - > — , 
where the inequality follows from A > — 1. 

Now suppose that M = N > 2 is even. Note that the the concepts of C form a code 
of distance d. Hence if d > N/2, then the Plotkin bound [15] implies that 



M < 2 



d 



2d-N 



5 This X is the bit-flip matrix defined in step 2 of Algorithm A, not the set over which the concept 
class is defined. 
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Since N is even, 2d — N > 2, and it follows that M < N unless d = N, in which case 
M < 2. Thus we must have d = N/2 so the columns of L are orthogonal. That is, if 
M = N > 2 is even, the quantum learning matrix is a Hadamard matrix. | 

We now use Lemma 4.1 to prove the general result: 

THEOREM 4.2. Let n > 1 be odd. Suppose that £ n = {g a \ a G }, where the functions 
g a : Z% — > Z 2 have the property that g a (x) depends only on the Hamming distance 
dist(a, x). If \£ n \ = 2 n , then the learning problem defined by £ n cannot be solved with 
probability 1 using a single quantum query. 

Note that if \£ n \ ^ 2 n , then a is not determined by g a . Thus, in general, when n is 
odd, the bitstring a cannot be learned with probability 1 in a single quantum query from 
any binary-valued function of the Hamming distance. 

Proof. Since g a {x) depends only on the Hamming distance dist(a, x), there exists a function 
h : {0, . . . , n} — > {0, 1} such that g a (x) = /i(dist(a, x)). 

Suppose that the learning problem defined by £ n can be solved with probability 1 
using a single quantum query. Then by Lemma 4.1, the quantum learning matrix L, with 
elements L xa = ( — l) 9a ( x \ is a Hadamard matrix. Consider the inner product of the two 
rows of L corresponding to the queries y = n and z = l 2 n_2 . Since L is a Hadamard 
matrix, 

(_iya(y)^_iy a (z) _ q 

In half of the terms of this sum, those for which the bits ao and a± differ, dist(a,y) = 
dist(a, z). Then g a (y) = ga(z), and hence each of these terms contributes +1 to the sum. 
In the other half of the terms, those for which ao = a±, each term must contribute —1 to 
the sum, so g a (y) = g a (z) + 1 (mod 2). But ao = a\ implies dist(a, y) = dist(a,z) ± 2. It 
follows that for any d G {0, . . . , n — 1}, h(d) ^ h(d + 2). Hence for some s G {0, 1, 2, 3}, 
h(d) = bi(d + s) for all d G {0, . . . , n}. Thus under the assumption that the concept class 
can be learned with probability 1 from a single quantum query, we have shown that h is a 
translate of bi. 

It remains to show that if n is odd, taking h to be a translate of b± leads to a matrix 
L that is not a Hadamard matrix. One easily sees that for such a function h, there is a 
sign e = ±1 such that 

for all d. It follows that any two rows of L corresponding to complementary values of x 
are equal up to sign. Hence L is not a Hadamard matrix. | 

When n = 3 (mod 4), the concept class C n we introduced in the previous section 
satisfies the hypotheses of Theorem 4.2, so it cannot be learned with probability 1 from a 
single quantum query. When n = 1 (mod 4), Lemma 3.1 tells us that the concept class has 
only 2 n_1 concepts so Theorem 4.2 does not apply to learning the concept classe C n in this 
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case. We already know in this case that a cannot be identified with probability greater 
than 1/2 with any number of queries, since f a = fa- Using Algorithm A (with appropriate 
minor modifications), however, a single query determines a up to complementation, so the 
concept class C n can be learned with a single quantum query. 

Notice that we did not use the fact that n is odd to reach the conclusion that h is 
a translate of b\. This means that for even n, the Hamming distance concept class C n is 
essentially the only one that can be learned with probability 1 using a single query. More 
precisely, we have: 

COROLLARY 4.3. When n is even, b\ {and translates) are the only functions of Hamming 
distance that yield a concept class learnable with probability 1 using a single quantum 
query. 

5. The permutation model 

The results of the previous section demonstrate that an n-bit string a cannot be 
learned with probability 1 using a single quantum query to an (n, 2)-Hamming distance 
oracle, when n is odd. A natural question, then, is: 

What is the largest probability with which a can be learned using a single quantum 
query to an (n, 2)-Hamming distance oracle? 

Furthermore, although previous work has shown that a can be learned with probability 1 
from an (n, 4)-Hamming distance oracle [1], neither that work nor our results to this point 
address the potential for learning with a 3-dimensional response register. So there is a 
second natural question: 

What is the largest probability with which a can be learned using a single quantum 
query to an (n, 3)-Hamming distance oracle? 

Before answering these questions, we reconsider the formulation of oracle algorithms. 

To allow comparison with the classical query complexity of oracle (learning) problems, 
the action of the oracle in a quantum algorithm must be the linear extension of a classical 
reversible operation. In Deutsch's [16] and Deutsch and Jozsa's [17] original quantum 
algorithms for oracle problems, the oracle acts on (C 2 )® n ® C 2 by 

0(c)\x,b) = \x,b + c(x)), (5.1) 

where the sum is computed modulo 2, but the second register is initialized to |0), so the 
action has the effect of simply writing the function value computed by the oracle into that 
register. Similarly, in quantum algorithms for hidden subgroup problems [18] the oracle 
computes a function that is constant on cosets of the hidden subgroup, and takes distinct 
values on distinct cosets, so it acts on (g> C r , where N is the size of the group and r is 
the number of distinct cosets, by (5.1), where the sum is computed modulo r. Again the 
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second register is initialized to |0) so this also has the effect of merely writing the function 
value into that register. 

Cleve, et a/., 6 noticed that the success probability of Deutsch's original algorithm 
could be improved to 1 by initializing the response register in the state |— ), thereby taking 
advantage of the action (5.1) when b = 1 as well as when b = 0, to "kick back" a phase 
of (-1)^) [19]. Algorithm A does the same thing. This application, as opposed to the 
application on the response register initialized to |0), emphasizes that 0(c) acts as a map 
on {0, 1} — the (labels of the) computational basis vectors of the C 2 response register — and 
is a classical reversible operation for each of the possible values of c(x): acts as the 
identity and 1 acts to exchange and 1. That is, the oracle response, both classically 
and quantum mechanically, can be thought of as an element of S2, the permutations of a 
two element set — it is either the identity, (1), or the other element of S2, the permutation 
(12). From this point of view, the action of an (n, 2)-Hamming distance oracle depends on 
a map {0, . . . ,n} — > S2: Simply adding the Hamming distance into the response register 
would be the map d 1— > (12) d , while the Algorithm A oracle action comes from the map 
d I— > (12) 6l ( d ) (using cycle notation [20] for permutations of the elements of R, which we 
label {l,...,r}). 

But this implies a novel conceptualization of the action of an oracle when r > 2, as 
it can be for (n, r)-Hamming distance oracles, namely that the action should depend on 
a map a : {0, . . . , n} — > S r which takes each function value computed by the oracle and 
associates to it a permutation of a response set R with \R\ = r. In a quantum algorithm, 
R is identified with the computational basis of the tensor factor used as the response 
register. The map a can be more complicated than d 1— > (12 . . . r) d , i.e., addition of the 
Hamming distance modulo r. This simple action can be characterized an abelian oracle 
since the range of a is contained in a cyclic subgroup of S r . It allows a to be identified 
with probability 1 when r = 4 [1], but in other cases there is no reason to think that it 
is the optimal action. In general we should consider non-abelian oracles, ones for which 
the range of a contains noncommuting permutations of R. More precisely, we define the 
action of an oracle on (C 2 )® n ® C r by 

O a (a)\x,b) = |x,<7 dist ( 0)!B )(&)), (5.2) 

and let 

2"-lr-l 2 

p n (r)= max V V| (a, b\UO a (a) \t] ) <g> ^ . (5.3) 

cr:{0,...,n}^S r ^— ' 

ipec ,ueu(2 n r) a -° b ~° 

Using this notation, Hunziker and Meyer's result [1] shows that p n (f) = 1 for r > 4, 
Theorem 3.2 shows that P2j(f) = 1 for r > 2, and Lemma 3.1 and Theorem 4.2 show that 
P2j-i(2) < 1, for j any natural number. Furthermore, the two questions above can be 
phrased as: What are p 2 j_i(2) and p 2 j_i(3), respectively? 



And Tapp, according to a note in [19], and most likely others as well 
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6. Numerical optimization results 

We are considering learning algorithms that send a single equal superposition query 
<S> ip to the oracle. If the states {0(c)\rf) (g> ip | c G C} are linearly independent, 
then the optimal measurement to distinguish them, i.e., to identify c, is the square root 
measurement, as Sasaki, et al. noted [21] using early results in quantum state discrimination 
[14, Appendix A]. 7 Thus we have the following: 

PROPOSITION 6.1. Let C be a Y -valued concept class of size M over a set X of size N. Fix 
a response set R and an assignment a of a permutation of R to each y 6 7 . Also fix the 
initial state ip of the response register C R . Let B be the Nr by M matrix with columns 
indexed by the concepts c G C, and with column c the state 0(c)\rf) <g> ip. Suppose that 
the columns of B are linearly independent, and that the diagonal elements of G = B^B 
are equal. Let \/G denote the positive semi-definite square root of G. Then the optimal 
single-query quantum algorithm using the equal superposition query jr/ 3 ) <g) ^ succeeds with 
probability the diagonal value in \[G. 

This proposition justifies the main step in the following numerical method. 
Method B. 

1. Input n and r. 

2. Repeat Steps 3 and 4 below for all possible assignments a : {0, . . . , n} — > S r - 

3. For ip G C r a unit vector, Proposition 6.1 allows us to calculate the maximal 
success probability M(^) of a single-query quantum algorithm using the query 
|?7 ) 

4. Numerically maximize M(ip) over all unit vectors ip G C r . 
Using this method we obtain the following numerical results: 

First, let n = 3. 

For r = 2, we find ^3(2) ps 0.800. This is achieved using the permutations 
o"o = o"2 = 03 = (1) and o\ = (12). It can also be achieved using the permutations 
a = ai = a 2 = (1) and 03 = (12). 

When r = 3, this improves top 3 (3) pa 0.974. Here a best permutation assignment 
is cr = (1), (J\ = (12), a 2 = (132), and a 3 = (123). (There are several other 
assignments of permutations that yield the same success probability.) 

Second, let n = 5. 

When r = 2, we find ^5(2) pa 0.721. This is achieved using the permutations 
o"o = 03 = 04 = 05 = (1) and o\ = a 2 = (12). 



The introduction of this approach into the context of concept learning may be found in [8] . 
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When r = 3, this improves to ps(3) ~ 0.955. Here the best permutation assign- 
ment is a = (1), (7i = (123), (7 2 = (132), a 3 = (12), a 4 = (1), and a 5 = (123). 
The optimum initialization for the response register is approximately 



1) - 0.1065z|2) + 1.1064z|3), 



normalized to have unit length. 



Note that Proposition 6.1 requires the columns of the matrix B to be linearly indepen- 
dent. In cases that the the columns of B are are linearly dependent, Proposition 6.1 does 
not tell us what to do, and Method B may not succeed in finding the optimal solution. In 
our problem it turns out that certain assignments of permutations lead to matrices B with 
linearly dependent columns. One suspects that these assignments are not as good as the 
assignments for which B has full rank, but this is not guaranteed by Proposition 6.1. In 
particular, when the rank of B is low, it is generally true that it is impossible to distinguish 
these states with high probability: 8 

LEMMA 6.2. Suppose ifji, i e {l,...,n} are pure states contained in a k-dimensional 
subspace W. Then any n-valued measurement for identifying i succeeds with probability 
at most k/n. 

Proof. Let pi be the density matrix corresponding to ipi. Let Ww denote projection onto 
W. Then pi < U w for all i. Hence, if {Xi} is any measurement, we can bound the success 
probability of this measurement as follows: 



Lemma 6.2 suffices to guarantee that cases in which B has linearly dependent columns 
yield success probabilities that are smaller than the ones presented in the list above. When 
n = 3 (for both r = 2 and r = 3), we find that a given assignment of permutations either 
leads to a matrix B that is full rank (rank 8) for a generic choice of ift, or has rank at 
most 5. In this latter case, Lemma 6.2 implies that the success probability is at most 5/8, 
which is smaller that the probabilities shown above in the full rank case. When n = 5, 
B has either full rank (rank 32), or rank at most 22, which implies a success probability 
of at most 22/32 in the linearly dependent case. Again, this is smaller than the numbers 
reported above for the linearly independent case. 

7. Conclusions 

We have introduced a novel generalization for the action of oracles in quantum (and 
reversible classical) algorithms: the permutation model. For n-bit Hamming distance 



This is a broadly applicable result that may well exist in the literature, but we have been unable to 




I 



find it. 
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oracles the action is specified by a choice of map a : {0, . . . , n} — > S r when the response 
register has dimension r. The standard additive action of the oracle is described by the 
map a(d) = (1 . . . r) d . Algorithm A in Theorem 3.2 demonstrates the striking improvement 
possible by an oracle that acts by some other map of Hamming distances to permutations: 
for r = 2 the success probability of learning from a single query to an oracle that acts 
by the additive action is l/2 n_1 , while for any even n it is 1 for an oracle that acts by 
a(d) = (I2) bl ( d \ and for n = 3 and n = 5 it is approximately 0.800 and 0.721, respectively, 
using the actions listed in §6. 

Allowing a larger response register, namely r = 3, improves the latter two probabil- 
ities to approximately 0.974 and 0.955, respectively. In general, p n (r) is a nondecreasing 
function of r. One might guess that if there is enough room in the response register to 
encode each possible function value y G Y as a distinct permutation of {1, . . . , r}, then 
adding additional dimensions to the response register would not improve the success prob- 
ability. This guess would mean that p n (r) would be constant for r! > n + 1. This is not 
the case, however, as the n = 3 results show: ps(3) < 1 while ps(4:) = 1. 

As this counterexample indicates, the permutation model raises a host of new ques- 
tions. We close by listing a few more: Is there some dimension for the response register 
above which p n (r) is constant? Perhaps n + 1? What happens to p2j-i(r) as j — > oo for 
fixed r? Does it decrease to 1/2? Or to something larger? What constitutes a good, or 
optimal, choice of permutations and initial response register state? 
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